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Abstract 

Although ARTMAP and ART-based models were introduced in early 70's they were not used in characterizing 
and classifying ecological observations. ART-based models have been extensively used for classification models 
based on satellite imagery. This report, to our knowledge, is the first application of ART-based methods and 
specifically ARTMAP for predicting habitat selection and spatial distribution of species. We compare the 
performance of ARTMAP to assess the breeding success of three bird species (Lanius senator, Hippolais pallida, 
and Calandrella brachydactyla) based on multi-spectral satellite imagery and environmental variables. ARTMAP is 
superior both in terms of performance (percent correctly classified - pec — 1.00) and generalizability (pec > 0.96) 
to those of feedforward multilayer backpropogation (> 0.87, > 0.65), linear and quadratic discriminant analysis 
(> 0.48, > 0.46) and k-nearest neighbor (> 0.82, > 0.66) methods. Compared to other methods, ARTMAP is able 
to incorporate new observations with far less computational effort and can easily add data to already trained models. 
Keywords: ART; ARTMAP; artificial neural networks; backpropogation; pattern recognition; spatial habitat 
selection; Lanius senator; Hippolais pallida; Calandrella brachydactyla. 
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, 1 Introduction 



2 Characterization of observations to explain interactions in an ecosystems as well as within communities and 

3 individual species, in order to predict a state has been one of the main problems in ecology. The inherent complexity 

4 of the ecological processes, the relatively limited number of possible observations and their susceptibility to 

5 observational and/or measurement noise has been considered among the major difficulties in predicting a state in 

e ecology (Fielding, 1999). Subject to these constraints, efforts to characterize ecological data and predict the state of a 

7 given ecosystem or community shifted towards statistical methods, rather than box-and-arrow type differential 

8 equation models (Ross, 1976; Lassiter and Kearns, 1977). Statistical models proved to be more robust in terms of 

9 capturing nonlinearities and being generalizable over new data sets (Moilanen, 1999; DeValpine, 2003). 

10 Several statistical techniques are readily available for the use of ecologists to characterize observations. These range 

11 from simple regression models (Gutierrez et al., 2005; Miller, 2005) to generalized additive (Dunk et al., 2004) and 

12 linear models (Ozesmi and Mitsch, 1997; Tan and Beklioglu, 2005) and from classification algorithms such as 

13 k-nearest neighbor (k-NN), linear and quadratic discriminant analysis (LDA and QDA, respectively) (Joy and Death, 
u 2003; Maron and Lill, 2004) to recently genetic algorithms (Underwood et al., 2004), pattern recognition methods, 

15 such as artificial neural networks (Lek et al., 1996; Recknagel et al., 1997; Lek and Guegan, 1999; 

16 Ozesmi and Ozesmi, 1999) and lately ecological data mining (Chawla et al., 2001). While standard parametric 

17 methods such as LDA, QDA and regression are mostly criticized as being dependent on strong assumptions about the 
is distribution of the underlying data (Hastie et al., 2001), classification and pattern recognition methods require large 

19 number of training points. On the other hand, artificial neural network-based approaches are blamed to be black-box 

20 models thus not being able to provide insight into the complex interactions of the ecosystem processes, although they 

21 are able to overcome the difficulties associated with traditional statistical models (Bishop, 1995; Ripley, 1996; 

22 Hastie et al., 2001). Nevertheless, artificial neural network-based models can provide valuable insight into ecosystem 

23 dynamics as there are several techniques for 'opening the black-box' (Ozesmi and Ozesmi, 1999; Olden and Jackson, 

24 2000; Ozesmi et al., 2005). 

25 Recently, backpropogation based methods became popular in ecological applications. Their use range from 

26 characterization of habitat selection of phytoplankton (Scardi, 1996, 2001) to fish (Reyjol et al., 2001) and bird 

27 species (Ozesmi and Ozesmi, 1999), to modeling whole communities and ecosystems (Tan and Smeins, 1996; 

28 Tan and Beklioglu, 2005) and characterization of wildlife damage (Spitz and Lek, 1999) to gain insight into the 

29 dynamical structure of the ecosystems. However, the main drawback of backpropogation based methods has been that 

30 they are inherently off-line, that is iterative, methods using all the available data at once. In other words, each time a 

31 new observation is made, these models require to be retrained with the whole data set in order to include the new 

32 observation, thus requiring a significant amount of computational resources and time. In addition, the fact that the 

33 performance, particularly generalizability, of these methods reduces significantly with limited number of data points 

34 renders this approach to be impractical, at least in ecology where the number of observations are commonly limited. 

35 This report aims to introduce another statistical pattern recognition model, ARTMAR based on adaptive resonance 

36 theory (ART) (Grossberg, 1976a,b), which is relatively unfamiliar to the ecological community. ART is originally 

37 developed to explain cortico-cortical interactions for object recognition and learning in the brain during early 70's 

38 (Grossberg, 1976a). During 80's and early 90's, ART was extended as a pattern recognition and classification 
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39 algorithm, and successfully applied to several benchmark technological data sets and classification of satellite 

40 imagery data (Grossberg, 1988; Carpenter et al., 1991c, 1997). However, despite its long history as a statistical 

41 pattern recognition and classification algorithm, this report, to our knowledge, is the first application of an ART based 

42 algorithm to an ecological data set. In addition to being on-line (that is a non-iterative learning algorithm, which 

43 enables easy and fast incorporation of new observations to an already trained model), ARTMAP also performs 

44 significantly better on the data set considered here, utilizing a considerably smaller amount of computational time. To 

45 that end, we used satellite-based multi-spectral data and environmental variables to predict the occurrence of three 

46 bird species of Southeastern Anatolia, namely woodchat shrike Lanius senator (Linnaeus, 1758), olivaceous warbler 

47 Hippolais pallida (Ehrenberg, 1833), and short-toed lark Calandrella brachydactyla (Leisler, 1814). To predict the 

48 occurrence of the three bird species we used k-NN, LDA, QDA, feedforward multilayer backpropogation network, 

49 and ARTMAR We provide a discussion of comparative performances of these different models. 



2 Methods 



51 2.1 Traditional Classification Methods 



52 We compared the performance of fuzzy ARTMAP model against traditional classification and pattern recognition 

53 methods commonly employed in ecological studies. The first method was k-nearest neighbor method, which is an 

54 accepted benchmark classification method, if one considers only the training data. Nearest neighbor methods use 

55 those observations in the training set T closest in the input space to x to form Y. More specifically, 



Y = - ]T Vl (2.1) 

56 where Nk(x) is the neighborhood of x defined by the k closest points Xi in the training sample. It is clear that when 

57 the neighborhood k is considered to be k = 1, k-NN methods potentially can reach the minimum classification error 

58 possible on the training set. Note that in this case the error on independent test set is intuitively expected to be quite 

59 high. In addition, we also used LDA and QDA, which are mostly argued to be "amazingly robust" on industrial data 
eo sets (Hastie et al., 2001). LDA and QDA techniques enable one to infer the posterior probabilities of the output 

61 categories based on the data observed, using Bayes theorem: 



P(G = k\X = x)= J/ X ^\ (2.2) 



62 where fk (x) is the class-conditional density of X in class G = k, and Hk is the prior probability of class k with 

63 X^fcLi = L LDA and QDA assume Gaussian distribution for class densities. Fundamentally, for two categon 

64 cases (as in our case), and assuming that the covariances of the class densities are equal, linear discriminant 
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65 function is given as 



S K = x T T, Vfe - ^Mfc s Vfe + log7r fe (2.3) 



ee where the parameters of the Gaussian distributions are estimated from the data as 



** = w {2A) 



ff,=fe Xi 



Afe = (2-5) 

(2.6) 



(N -K) 

e/ where Nk is the number of class-fc observations. An equivalent decision rule is given as G(x) = argmaxfc 5k(x). If 

6s the equality assumption of class covariances does not hold, we obtain quadratic discriminant function 

Sk(x) = -^log|Sfe| - ^(x - Hk) T Z k ~ 1 (x - n k ) +log7r fc (2.7) 

69 with an equivalent decision boundary between each pairs of classes k and I described by a quadratic equation 

70 {x : Sk(x) = Si(x)}. A more in-depth discussion of these two methods, among with k-NN method, can be found in 

71 Hastie et al. (2001). 

72 Traditional classification methods has been often criticized as they require strong assumptions about the underlying 

73 distribution of the observations (Ripley, 1996; Hastie et al., 2001). To overcome this problem, connectionist artificial 

74 neural network based approaches, such as feedforward multilayer backpropogation network has become recently 

75 popular among ecological modeling (Scardi, 1996, 2001; Tan and Beklioglu, 2005). Although ART and ARTMAP 

76 family of models are another type of artificial neural networks, they differ from connectionist approaches in several 

77 aspects (Carpenter et al., 1991a,b,c, 1992). For that reason, we also compared the performance of fuzzy ARTMAP 

78 model to that of a generalized linear model (GLM) and of a multilayer feedforward backpropogation model. 



9 2.2 ARTMAP 

so Briefly, ARTMAP architecture consists of two ART modules, which are self-organizing maps (Carpenter et al., 

si 1991a), one for input space and one for output space (figure 1; ART a and ART;,, respectively). Learning occurs for 

82 each ART module independently, whenever an expected category matches to presented input pattern, or a novel input 

83 pattern is encountered, then categories are formed in both ART modules and mapped on an associative learning map 

84 field. Thus, ARTMAP models represent a "pseudo-supervised" learning method (Carpenter et al., 1991a). There are 

85 several variants of ART modules (Carpenter and Grossberg, 1990; Carpenter et al., 1991b,c). Here, we used fuzzy 
se ART modules, which were developed as pattern recognition methods for data sets with continuous input space 

87 (Carpenter et al., 1991c, 1992). Shortly, each fuzzy ART system contains an input field Fq, a Fi field receiving 
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es bottom-up signals from F and top-down input from F 2 , the latter of which represents the active category (figure 1). 

89 So-called complement coding (Carpenter et al., 1992) should be employed before feeding the input vectors to fuzzy 

90 ART modules. Theoretical considerations for this requirement are discussed in detail in Carpenter et al. (1992). 

91 Fundamentally by complement coding, it is meant that an TV x P-dimensional input matrix a is coded and fed to the 

92 model as an N x 2P-dimensional matrix [a, a c ], where a\ = (1 — a{). 

93 At each F 2 category node, there is a weight associated with that node, which are initially set to 1 . Each weight Wji is 

94 monotonically increasing with time and hence its convergence to a limit is guaranteed (Carpenter et al., 1991c, 1992). 

95 Fuzzy ART dynamics depend on a choice parameter a > 0, a learning rate (3 € [0, 1], and a vigilance parameter 

96 p e [0, 1]. For each given input pattern and jth node of F 2 layer, the choice function Tj is defined by 



TAT) = ^4 (2-8) 

where A is the fuzzy AND operator and is equivalent to component-wise min operator, | • | is the euclidean norm, and 
w j = ( w ji ' ' ' w jM)- The system makes a category choice when at most one F2 node can become at a given time, 
and the category choice is given as Tj = max{Tj : j — 1 . . . N}. In a choice system, the activity of a given node at 
Fi layer is given as x = I if F 2 node is inactive and x = I A wj if Jth F 2 node is selected. Resonance occurs in the 
ART module if 



I A W/l 

|t| > p (2.9; 

102 and reset occurs otherwise. If reset occurs, the value of the choice function Tj is set to 0, and a new index J is 

103 chosen. The search process continues until the chosen J satisfies the resonance criterion (equation 2.9). Once search 

104 ends and resonance occurs, the weight vector wj is updated by 



w (ncw) = p ^ A w (old)^j + (1 _ ^(old) (2 1Q) 

105 As briefly mentioned above, fuzzy ARTMAP model consists of two fuzzy ART modules, one for input and one for 

106 target vectors linked by an associative learning network and an internal controller. With reference to figure 1, when a 

107 prediction by ART a module, which receives the input vectors, is disconfirmed at ART 5 module, receiving target 

108 vector, inhibition of map field activation induces the match tracking process, which raises the ART a vigilance p a to 

109 just above the Ff activation so that the activation of Fg matches the reset criterion (i.e., p a is decreased just to miss 

110 the match criterion given by equation 2.9). This triggers an ART a search process which leads to activation of either 

111 an ART a category that correctly predicts b at match field, or to activation of a new node which has not used before 

112 (that is, either an already formed category that predicts b is selected, or a new category is created). ART and 

113 ARTMAP algorithms, in essence, are similar to k-NN methods with adaptive update of the size of the neighborhood 

114 with each pattern encountered in the data. It is, nevertheless, a nonlinear algorithm such that the shape of the clusters 

115 built based on the patterns embedded in the input space are nonlinear. For details of fuzzy ART algorithm as well as 

116 for its geometrical interpretation, readers are referred to Carpenter et al. (1991c), and the details of fuzzy ARTMAP 

117 algorithm can be found in Carpenter et al. (1992). 
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118 Although new to ecology, ART and ARTMAP theory has been developed since early 70's, and the reader is referred 

119 to Cohen and Grossberg (1983) and Grossberg (1988) for theoretical considerations. Generic implementational issues 

120 can be found in Carpenter (2003). 

121 2.3 Implementation Details 

122 2.3.1 Data 

123 Ornithological and ecological data used in this study has been obtained from the GAP biodiversity research project of 

124 Turkish Society for the Conservation of Nature (DHKD) conducted between 2001 and 2003 (Welch, 2004). Detailed 

125 description of observations and data collection method can be found in Kurt (2004) and Welch (2004). 

126 During the field studies, which lasted two years, 1592 points were visited and the ecological variables as well as the 

127 breeding success of bird species were recorded. Satellite imagery used in this study was obtained by the Turkish 

128 Society for the Conservation of Nature, and consisted of LANDS AT images bands 1-5 and 7, with a resolution of 

129 30 x 30 m. The characteristics of the satellite images and the properties of the bands used are given in detail in Per 

130 (2003) and Kurt (2004). 

131 Independent variables were 6 image bands and 6 environmental variables. Environmental variables were elevation 

132 (m), distance to nearest road (m), distance to water (m), vegetation index (categorical), annual relative humidity (%), 

133 and annual mean temperature (°C). For all the models considered, the output classes for each data pattern has been 

134 assigned either or 1, depending on the occurrence of individuals recorded for each bird species considered here 

135 Kurt (2004). 

136 It is important for statistical learning methods to have an input space where the number of data points for each output 

137 category (0 and 1, in our case) is approximately balanced to avoid biased estimates (Ripley, 1996). To that end, 

138 although there were 1592 data points collected in our data set, the number of data points corresponding to category 1 

139 (i.e., the presence of individuals) were limited (246 - 274, depending on the species), and in order to establish 

140 balance, we randomly selected an equal number of data points with output category to the number of points with 
hi breeding individuals (category 1) (Hirzel et al., 2002). Thus, the data fed to the models were consisting of 492-548 

142 observations depending on the bird species considered. 

143 The importance of setting aside independent test data, which should not be included during training, to assess the 

144 actual performance of a given model has been rigorously emphasized elsewhere (Ripley, 1996; Ozesmi and Ozesmi, 

145 1999; Hastie et al., 2001; Tan and Beklioglu, 2005). To that end, we randomly split the data sets for each species into 
He two sets with equal number of data points such that the number of data points corresponding to each category were 
147 still balanced, and used one set to train the models, while the other to asses the generalizability of the trained models. 

us 2.3.2 Traditional Classification Models 

149 k-NN, LDA, and QDA models were implemented in R-language statistical software (R, 1991). The theoretical 

150 considerations and implementation details for these models can be found in Hastie et al. (2001). GLM and 
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151 backpropogation models were implemented using NevProp3 software (Goodman, 1996). For backpropogation 

152 models, the architecture of the network is optimized step-wise (Ozesmi et al., 2005), and the networks with 8, 3 and 

153 10 hidden units were used as final models for L. senator, H. pallida and C. brachydactyla, respectively. Theoretical 

154 considerations for feedforward multilayer backpropogation networks can be found in Rumelhart et al. (1986), Bishop 

155 (1995) and Ripley (1996), and the implementation details of GLM and backpropogation models for this particular 

156 study are given in Kurt (2004). 

157 2.3.3 ARTMAP 

158 ARTMAP was implemented in Matlab version 7 (Mathworks Inc.). All input variables were standardized to zero 

159 mean, and units of standard deviation before being fed to all models, but ARTMAP. For ARTMAP, the input 

160 variables are standardized such that they are squeezed into a hypercube C p € [0, 1], where P is the number of 

161 independent features (i.e., dimension of input space). Theoretical considerations for the reason to use this particular 

162 standardization for ARTMAP models is beyond the scope of this report, and interested readers are referred to Kosko 

163 (1992). 

164 All six models have been trained three times separately for the three bird species, and each trained model is then 

165 tested separately on corresponding test sets to asses its generalizability. All models have been trained using 

166 bootstrapping and cross-validation to optimize so-called bias-variance trade-off (Hastie et al., 2001). 

67 3 Results and Discussion 

68 3.1 Performance of the Models on Training and Independent Tests 

169 The performances of all five models for all three different bird species on both training and test sets are given in Table 

170 1 . For backpropogation models, the performance is given as c-index, which is approximately the area under the ROC 

171 curve (Bishop, 1995). For other four models, the performance is given as percent correctly classified. Note that for 

172 data sets with perfectly balanced number of data points corresponding to each output category, percent correct 

173 measure is equivalent to the c-index measure (Bishop, 1995; Ripley, 1996). Hence, the performance measures of all 

174 five methods in our case are compatible. Further note that unlike traditional performance measures such as R 2 , a 

175 value of 0.5 for percent correct and c-index indicates a performance not better than random. 

176 As evident from Table 1, the performance of neural network models, both ARTMAP and backpropagation, was 

177 superior compared to the traditional classification algorithms. For the latter group, especially for LDA and QDA, the 

178 data corresponding to H. pallida seems to be particularly "difficult", with both models' performance on training set 

179 being around random chance level. Among traditional classification models, although k-NN performed better on 

180 training set compared to LDA and QDA, it too suffered from low performance on independent test sets. 

181 Backpropogation and GLM method's performance on training sets was considerably better than previous three 

182 techniques, and it is especially noteworthy that backpropogation model predicted all of the data points on the training 
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183 sets correctly for the data sets of L. senator and C. brachydactyla. However with respect to training sets, ARTMAP 

184 model performed same on these two data sets, and better on set H. pallida than backpropogation model. To this end, 

185 also note the number of hidden units in backpropogation and the number of formed categories at fuzzy ART module 

186 for input vectors (committed nodes) in ARTMAP models (8,3,10 and 2,4,3, respectively). The number of hidden 

is? units (or equivalently, of committed nodes) indicate how well the input space is represented as a compressed code in 

las the internal structure of the model (Ripley, 1996; Carpenter et al., 1991a). Considering the fact that the number of 

189 compressed representations are equivalent to the degrees of freedom of the model (Bishop, 1995; Ripley, 1996), 

190 ARTMAP appears to be more effective in representing the input space, compared to backpropogation method. And it 

191 does so without sacrificing the performance on the training set. In addition, the less the degrees of freedom of a 

192 model is, the more generalizable it would be (Hastie et al., 2001). The performances of GLM, backpropogation and 

193 ARTMAP models on independent test sets also revealed this fact in that the predictive power of ARTMAP was 

194 considerably better than the other two, being close to 1 for each of the three independent test cases (Table 1). Thus, at 

195 least for the current data set considered, ARTMAP seems to be more robust in characterizing ecological data and 

196 predicting the species occurrence, in terms of both training accuracy and generalizability. 



97 3.2 Computational Efficiency 

198 In addition to its superiority in terms of training and test performance, ARTMAP also has the advantage of being 

199 computationally much less expensive than feedforward backpropogation networks. For the results presented in this 

200 report, backpropogation network required close to 1000 iterations on the complete training set, which approximately 

201 took 18 minutes on a P4 1.8GHz PC. Noting that backpropogation models also require architecture as well as free 

202 parameter (e.g. learning rate, momentum etc.) optimization, with each model to be trained separately, to achieve best 

203 performance, the amount of computational time required grows significantly. On the other hand, fast-learning mode 

204 of ARTMAP (Carpenter et al., 1992) enables the network to learn "one-shot deals", that is to learn without iterating 

205 the training set. ARTMAP model on fast-learning mode on the same system took approx 10 seconds to train and 

206 achieve the performances given in Table 1 . In addition, ARTMAP models have only a single external parameter, and 

207 consist of two separate self-organizing maps, and as such, they do not require any optimization steps, which renders 

208 these family of models to be considerably powerful in terms of computational time required. The non-iterative nature 

209 of ARTMAP method also enables new observations to be incorporated to the model as soon as they are obtained, so 

210 that the model can be updated with each new observation without any considerable computational effort. 

211 The noticeable performance of ARTMAP model compared to traditional statistical classification techniques, as well 

212 as to feedforward multilayer backpropogation method, particularly in terms of generalizability over new data sets 

213 suggest that ART-based methods, as presented in this report are potentially robust statistical techniques that can be 

214 used instead of already familiar methods. Considering their relatively little computational requirements compared to 

215 their closest follower backpropogation models, ART-based models seem to be potential candidates as future 

216 predictive models in ecology. 
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3,5 Figure Captions 



316 Figure 1: Schematic representation of fuzzy ARTMAP architecture. Input vectors are processed in ART a module 

317 while target categories are processed in ART 5 module. Semi-disks represent adaptive weights. For details, see 

318 text, (redrawn from Carpenter et al. (1992)). 
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Tables 



Table 1: Performance of the models on training and test sets. N: number of data points; P: number 
of input variables; k-NN: k-nearest neighbor; LDA: linear discriminant analysis; QDA: quadratic 
discriminant analysis; GLM: generalized linear model; BackProp: feedforward multilayer backpro- 
pogation network; ARTMAP: adaptive resonance theory based supervised learning. The perform- 
ance is given as c-index for backpropogation network, and as percent correctly classified for other 
models (see text). 



Set 


N 


P 


k-NN 


LDA 


QDA 


GLM 


BackProp 


ARTMAP 


L. senato r(train) 


274 


12 


.828 


.781 


.799 


.859 


1.00 


1.00 


L. seraafor(test) 


273 


12 


.678 


.780 


.798 


.781 


.831 


.971 


H. pallida(Xxam) 


246 


12 


.866 


.488 


.496 


.759 


.874 


1.00 


H. pallida(test) 


245 


12 


.669 


.486 


.502 


.703 


.657 


.980 


C. bmchydactylaiXxam) 


294 


12 


.847 


.646 


.701 


.855 


1.00 


1.00 


C. bmchydactyla(tesi) 


293 


12 


.765 


.648 


.703 


.769 


.809 


.962 
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